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Abstract 

Quorum sensing (QS) regulates the onset of bacterial social responses in function to cell density having an innportant impact in 
virulence. Autoinducer-2 (AI-2) is a signal that has the peculiarity of mediating both intra- and interspecies bacterial QS. We analyzed 
the diversity of all components of AI-2 QS across 44 complete genomes of Escherichia coli and Shigella strains. We used phylogenetic 
tools to study its evolution and determined the phenotypes of single-deletion mutants to predict phenotypes of natural strains. Our 
analysis revealed many likely adaptive polymorphisms both in gene content and in nucleotide sequence. We show that all natural 
strains possess the signal emitter (the luxS gene), but many lack a functional signal receptor (complete /sr operon) and the ability to 
regulate extracellular signal concentrations. This result is in striking contrast with the canonical species-specific QS systems where one 
often finds orphan receptors, without a cognate synthase, but not orphan emitters. Our analysis indicates that selection actively 
maintains a balanced polymorphism for the presence/absence of a functional /sr operon suggesting diversifying selection on the 
regulation of signal accumulation and recognition. These results can be explained either by niche-specific adaptation or by selection 
for a coercive behavior where signal-blind emitters benefit from forcing other individuals in the population to haste in cooperative 
behaviors. 

Key words: genome evolution, gene loss, £ coli, balancing selection, social cheater, bacteria signaling. 



Introduction 

There is an increasing awareness of the importance of micro- 
bial social interactions (Crespi 2001; West et al. 2006, 2007; 
Foster et al. 2007). Although unicellular organisms, bacteria 
can express complex coordinated multicellular behaviors, such 
as biofilm formation, antibiotic production, and secretion of 
virulence factors. Some of these behaviors require a large 
quorum of cooperating bacteria to be effective, that is, high 
cell density. Quorum sensing (QS) is a key communication 
system that coordinates cooperative behaviors in bacteria in 
function of cell density (Crespi 2001 ; Waters and Bassler 2005; 
Keller and Surette 2006; West et al. 2006). 

QS involves the production, secretion, and recognition of 
small signal molecules called autoinducers detected by cog- 
nate receptors. Most autoinducers are species specific and 
thus promote intra-specific communication (Waters and 



Bassler 2005). An important exception is the AI-2 system 
that uses as a signal a family of small molecules called 
autoinducer-2 (AI-2). The enzyme that produces AI-2 (LuxS) 
is present in both Gram-positive and Gram-negative bacteria. 
Because of the wide taxonomic distribution of LuxS, and the 
demonstration of the susceptibility of this system to interspe- 
cies interference, AI-2 has been proposed to be a signal pro- 
duced to mediate both intra- and interspecies communication 
(Surette et al. 1999; Chen et al. 2002; Xavier and Bassler 
2005a; Pereira, Thompson, et al. 2012). The substrate for 
AI-2 synthesis by LuxS is 5-ribosylhomocysteine (SRH), which 
derives from the toxic intermediate 5-adenosylhomocysteine 
(SAH) a product from 5-adenosylmethionine (SAM) metabol- 
ism, an important and ubiquitous central metabolite of the cell 
(fig. 1 ) (Schauder et al. 2001 ; Winzer et al. 2002, 2003; Xavier 
and Bassler 2003; De Keersmaecker et al. 2006). For this 
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FiG. 1. — AI-2 biosynthetic pathway and Lsr- mediated transport and processing in Escherichia coli. (A) The precursor of AI-2 biosynthesis is SAM, an 
essential compound in central metabolism used as a methyl donor for DNA, RNA, and proteins. Following methyl transfer from SAM to its various substrates, 
the toxic compound SAH is formed. The Pfs enzyme removes adenine from SAH to form SRH. LuxS acts on SRH to produce homocysteine and AI-2 that 
released into the extracellular environment. (51) AI-2 is bound by the periplasmic protein LsrB and internalized by the Lsr ATP-binding cassette transporter. 
Intracellular AI-2 is phosphorylated by LsrK, and the phosphorylated form of the signal (P-AI-2) induces /sr transcription by derepressing the repressor of the lsr 
operon (LsrR). This results in further assembly of the transporter and rapid AI-2 internalization. LsrF and LsrG proteins are also encoded by the /sroperon and 
are required for the further processing of intracellular P-AI-2. {B2, B3) Shaded cells represent examples of strains that maintain production of AI-2 although 
they lack the ability to sequester and process the extracellular AI-2 signal through the Lsr system {B2) or lack the Lsr system completely {B3). Pentagons 
represent the AI-2 signal. 



reason, AI-2 can be considered a recycling product of SAM, 
and it has been suggested that it nnight not be a true signaling 
nnolecule in all AI-2-producing bacteria (Winzer, Hardie, et al. 
2002; Winzer et al. 2002; Vendeville et al. 2005; Hardie and 
Heurlier 2008). 

A nnajor obstacle to understand the role of this nnolecule as 
a communication signal has been the lack of information on 
the molecular mechanisms of AI-2 detection and signal trans- 
duction networks in the majority of organisms. Importantly, 
such mechanisms have now been well characterized in 
Escherichia co// (reviewed in Pereira, Thompson, et al. 2012). 
In this bacterium, LuxS produces AI-2 during active growth, 
which is secreted into the extracellular medium where it ac- 
cumulates in a cell-density manner until it triggers the activa- 
tion of the Lsr (for LuxS regulated) system in the receptor cells. 
The genes of the lsr operon encode an ABC transporter 



responsible for the internalization of AI-2 into the cells and 
other enzymes that regulate the expression of the operon and 
further intracellular metabolic degradation of the AI-2 signal 
(fig. 1 ). As a result of the activation of this system, AI-2 levels in 
the extracellular medium peak in midlate exponential phase 
and rapidly decline at the transition into stationary phase 
when the signal is removed from the environment (Wang, 
Hashimoto, et al. 2005; Wang, Li, et al. 2005; Xavier and 
Bassler 2005a, 2005b). By mediating the removal of AI-2 
from the environment, this process can potentially affect 
any individual cell in the vicinity with AI-2-dependent gene 
expression, independently of its species identity (Xavier and 
Bassler 2005a; Pereira et al. 2008). 

A recent study showed that the ability to bind and intern- 
alize AI-2 signal via Lsr is not ubiquitous among £ co// strains. 
Two £ coli strains were shown to lack many genes in the 
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operon, and phenotypic assays confirmed lack of function 
(Pereira et al. 2009). The finding of this unexpected poly- 
morphism leads us to investigate the genetic diversity of the 
AI-2 system among £ coli natural populations. Escherichia coli 
is an important component of the mammalian gut micro- 
biome, especially during lactation, and is extremely diverse. 
It comprises both commensal and pathogenic variants, with 
different tropisms, and even some environmentally adapted 
strains (Kaper et al. 2004; Tenaillon et al. 2010; Luo et al. 

201 1) . The study of genetic variation in this species can thus 
provide important information on the role of the interspecies 
signal, AI-2, in an organism that coexists and interacts with 
many different species in its natural habitat. In £ coli, AI-2 QS 
regulates many social traits such as virulence (Zhu et al. 2007), 
biofilm formation (Gonzalez-Barrios et al. 2006; Herzberg 
et al. 2006; Reisner et al. 2006; Lee et al. 201 1), and chemo- 
taxis and cell motility (Bansal et al. 2008; Hegde et al. 201 1 ). If 
the fine tuning of AI-2 concentration via the LuxS production 
and Lsr system for AI-2 internalization is necessary to regulate 
the behavior of £ coli and of other species in the mammalian 
gut, the invasion of individuals that are impaired in signal pro- 
duction or internalization could affect the microbiota species 
composition and diversity. Such alterations of gut homeostasis 
can facilitate infections (Garrett et al. 2010; Clemente et al. 

2012) . 

In this study, we analyze the genetic diversity of AI-2 pro- 
duction, detection, internalization, and processing at the gene 
content and nucleotide levels using all complete sequenced 
genomes of £ coli and Shigella natural strains. We use this 
information to determine whether selective processes are 
implicated in the evolution of this system. Many studies 
have addressed the biochemical mechanisms or the experi- 
mental evolution of QS. Oddly, there have been very few 
studies on the natural genome diversity of QS. Analyses of 
natural polymorphisms provide an important tool to under- 
stand the selective pressures acting on the evolution of social 
behaviors in microorganisms. The information provided by 
comparative genomics of natural organisms, which focus on 
polymorphisms that have passed the filter of natural selection 
through millions of generations in their natural habitats, are 
ideal to study the evolutionary relevance of genes and path- 
ways. Here, we took advantage of the large number of gen- 
omes available from natural £ coli and Shigella strains to study 
from a genome-wide perspective the evolution of polymorph- 
ism of the different components of the AI-2 system. Our ana- 
lysis reveals that the AI-2 system follows a unique pattern of 
genetic diversification that differs significantly from those of 
species-specific QS systems. 

Materials and Methods 

Genome Data 

We retrieved all complete genomes of £ coli and Shigella spp. 
present in the Kegg database (http://www.kegg.jp/kegg/, last 



accessed December 31, 2012) or in Genbank (http://www. 
ncbi.nlm.nih.gov/genome/, last accessed December 31, 
2012). Shigella spp. genomes were included in the analysis 
because it is well accepted that these organisms belong to the 
£ co//species(Ochmanetal. 1983; Pupoetal. 2000; Escobar- 
Paramo et al. 2003; Touchon et al. 2009). We excluded three 
laboratory strains for the following reasons: 1 ) £ coli str. K-1 2 
substr. W31 10 is a very recent laboratory variant of MG1655 
(included in the study); 2) £ co// str. K-1 2 substr. DH10B; and 
3) £ coli "BL21-Gold (DE3) pLysS AG" are genetically mod- 
ified laboratory strains with closely related nonmodified strains 
in our data set (£ coli K-1 2 BW2952 [MC4100] and £ coli B 
REL606, respectively). We added £ fergusonii as it is an 
well-established outgroup of £ coli and Shigella strains 
(Lawrence et al. 1991). In total, we included 45 strains (sup- 
plementary table SI, Supplementary Material online), five of 
which are laboratory strains that were included in the phylo- 
genetic reconstruction with the purpose of putting the natural 
strains and the natural diversification in the context of the 
better-studied laboratory strains. However, only natural strains 
were used in the population genetic analyses. All the genomes 
were downloaded on September 1 5, 201 1 . Gene content of 
the /sr operon was manually checked in each strain to confirm 
the presence of functional genes, identify misannotations, and 
characterize pseudogenes (identification of truncation, stop 
codon, and inserted sequences). 

Phylogenetic Analysis and Character Evolution 

The list of orthologs between two genomes was identified 
using reciprocal best hits with more than 80% similarity in 
protein sequence and less than 20% difference in length, as 
in Rocha, Touchon, et al. (2006). The core genome of the 
clade was built using the intersection of lists of orthologs 
from the pairwise analyses. Each gene family in the core 
genome was aligned in protein sequence using MUSCLE 
(Edgar 2004) and then back-translated to DNA. The multiple 
alignments were concatenated, and phylogenetic reconstruc- 
tion was performed on this alignment. A maximum likelihood 
distance matrix was built by Tree-puzzle 5.2 (Schmidt et al. 
2002) under the Hasegawa-Kishino-Yano-h r model. The 
tree was inferred from the distance matrix using BIONJ 
(Gascuel 1997). Support of the topology was estimated by 
bootstrapping on the core genome concatenated alignment 
(lOOx). 

To analyze character evolution, we coded each strain in 
terms of presence/absence of complete operon, pathovar, 
and the ability to replicate within macrophages. The latter 
two variables were coded using information from the litera- 
ture. We traced the history of character change through the 
phylogeny with the program Mesquite version 2.75, build 566 
(Maddison and Maddison 2011). Ancestral state reconstruc- 
tion was made under maximum likelihood using an asymmet- 
ric Markov k-state two-parameter model, with rates estimated 



1 8 Genome Biol. Evol. 5(1): 16-30. doi:10.1093/gbe/evs122 Advance Access publication December 16, 2012 



Evolution of AI-2 Quorum Sensing in E. coli 



GBE 



fronn the data. Correlations were carried out with Pagel's 
1994 test of independence between two binary characters 
(Pagel 1994) This test estimates the log-likelihood difference 
between a model where the rates of change in the two char- 
acters are independent and a model where the rates of 
change are correlated. The significance value of the log like- 
lihood difference after 1 ,000 simulations is presented for each 
correlation. 

Character Simulations 

We simulated character evolution along the phylogenetic tree 
to test whether the polymorphism observed for the presence/ 
absence of a complete /sroperon evolves under neutrality. This 
corresponds to the null model. We evolved 1,000 categorical 
(binary) characters in Mesquite (Maddison and Maddison 
2011) using the aforementioned two-parameter Markov-k 
model with asymmetric rates of forward and backward 
changes. Forward changes depict the instantaneous probabil- 
ity of gene inactivating mutations leading to incomplete, thus 
nonfunctional operon (from state 0 to state 1), and backward 
changes describe the instantaneous rate probability of regain- 
ing a functional operon, by back mutation, by gene conver- 
sion, or lateral gene transfer (from state 1 to state 0). To 
initiate the simulation process, one has to set the character 
frequencies at the root of the tree. We assumed those fre- 
quencies to be at equilibrium (rather than the alternative of 
equal frequencies). With this option, the expected frequencies 
at the root are assumed to be consistent with the model's 
rates, which is a more suitable option when the simulating 
model contains asymmetrical rates (Schluter et al. 1997). In 
this analysis, we calibrated the branch lengths of the phyl- 
ogeny into the same units of the model parameters by con- 
sidering the coalescent expectation that the time to the most 
recent common ancestor is in the order of 2Nq generations 
(Hudson 1990). Considering e = 2A/e|i and for £ coli 
0 = 0.0187 estimated with the core genome (this study) and 
|i = 8.9 X 10"^^ (Wielgoss et al. 2011) can time to the most 
recent common ancestor to be approximately 2. 11 x 10^ gen- 
erations, which corresponds to 0.01 24 total branch lengths as 
estimated directly from the tree. 

The simulations were carried out with rate parameters 
compatible with neutrality and with rate parameters esti- 
mated from the actual data. For each model, we estimated 
the level of polymorphism generated as the relative frequency 
of state 0 (functional operon), and the two distributions were 
tested for significant differences with a two-sample Kolmo- 
gorov-Smirnov test in R (http://www.R-project.org/, last 
accessed December 31, 2012). 

Genetic Diversity and Levels of Selection 

Standard analyses of genetic diversity and neutrality tests 
(Tajima 1989; McDonald and Kreitman 1991; Stoletzki and 
Eyre-Walker 201 1 ) were carried out for each gene with DnaSP 



5.10 (Librado and Rozas 2009) and Mega4 (Tamura et al. 
2007). McDonald-Kreitman test detects selection on protein 
coding sequences by comparing divergence and polymorph- 
ism data on synonymous and nonsynonymous sites under the 
assumption that synonymous substitutions are neutral; the 
test is robust to complex demography (Nielsen 2005) such 
as those likely to occur in bacteria. Tajima's D compares two 
measures of population genetic diversity that can be used to 
infer events of selection; but unlike the previous test, Tajima's 
D is highly sensitive to demographic effects (Simonsen et al. 
1995). The number of synonymous (nonsynonymous) substi- 
tutions per synonymous (nonsynonymous) sites was estimated 
in MEGA4 (Tamura et al. 2007) using the modified Nei- 
Gojobori method that assumes a transition/transversion rate 
bias and uses a Jukes-Cantor correction to account for mul- 
tiple substitutions at the same site. Standard errors were esti- 
mated after 1,000 bootstrap replicates. Patristic distances 
among strains were estimated in Mesquite (Maddison and 
Maddison 2011) by calculating the path-length distance 
from one strain to another along branches of core genome 
tree of figure 2. 

We analyzed codon-specific selection in the two genes 
with significant McDonald-Kreitman tests. We estimated 
the rates of nonsynonymous and synonymous changes at 
each site using likelihood-based approaches as implemented 
in HYPHY package and made available through the 
Datamonkey web service (Kosakovsky Pond and Frost 2005). 
We made separate analyses with the fixed effects likelihood 
methods (FEL and iFEL) and with the random effect likelihood 
method (REL) (Pond and Frost 2005). iFEL is the "population- 
level" version of FEL and applies when one is interested in 
selective pressures that are restricted to interior branches of 
the tree (Kosakovsky Pond et al. 2006). REL tends to be more 
powerful than FEL but has a higher rate of false positives (Pond 
and Frost 2005). For this reason, we run all methods and 
compared the results. A Bayes factor of 20 or more in favor 
of c/n > ds is usually considered as providing strong support for 
adaptive selection at the site (Pond and Frost 2005). For all 
these analyses, we estimated for each data set the best-fitting 
nucleotide model and the gene phylogeny using the available 
options on Datamonkey. 

We characterized codon usage of all genes using several 
statistics and averaging results across genomes: the effective 
number of codons (Wright 1990), the Codon Bias Index 
(Morton 1993), and the Codon Adaptation Index (CAI; 
Sharp and Li 1987). We used as a reference set the codon 
usage of a set of highly expressed genes in £ co// (Sharp and Li 
1986). Expected value of CAI (eCAl) provides a direct thresh- 
old value for discerning whether the differences in CAI are 
statistically significant or whether they are merely artifacts 
that arise from internal biases in the G-\-C composition and/ 
or amino acid composition of the query sequences (Puigbo 
et al. 2008). E-CAI is calculated from a set of query sequences 
by generating random sequences with G-\-C and amino acid 
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Fig. 2. — ^The phylogenetic analysis of Escherichia coliand Shigella strains inferred with the core genome (1 ,524 genes, 1 .6 Mb). The tree was inferred by 
neighbor-joining based on the maximum likelihood distance matrix (see Materials and Methods). Escherichia fergusonii was used as the outgroup. Arrows 
indicate presence/absence of luxS and genes in the /sroperon. Dashed lines represent pseudogenes. On the right, pathovar and phylogroups are indicated. 
Pathovars are enteroaggregative E. co// (EAEC), enterohemorrhagic E. co// (EHEC), enterotoxigenic E. co// (ETEC), enteropathogenic E co// (EPEC), adherent 
invasive E. co// (AIEC), extraintestinal pathogenic E. co// (EXPEC), uropathogenic E co// (UPEC), and avian pathogenic E co// (APEC). 
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content similar to those of the input. ENC and CBI were esti- 
mated with DNAsp, and eCAl was estimated the E-CAI server 
available at http://genomes.urv.es/CAIcal/E-CAI (last accessed 
December 31, 2012). 

Phenotypic Characterization of Null Mutants 

Wild-type £ coli K-12 strain MG1655 (Blattner et al. 1997) 
was used as the parental strain (supplementary table S2, 
Supplementary Material online). To construct chromosomal 
single-gene deletions, kanamycin resistance cassettes from 
the Keio collection of single-deletion mutants (obtained 
from National BioResource Project [Japan] [Baba et al. 
2006]) were introduced in the desired gene by phage PI trans- 
duction from the Keio collection to MG1655, as described 
elsewhere (Silhavy et al. 1984). All deletions were confirmed 
by polymerase chain reactions using Taq DNA polymerase 
(New England Biolabs). 

The phenotypes of all single-deletion mutants were deter- 
mined in terms of AI-2 internalization by quantifying the time 
course of extracellular AI-2 concentration in cell-free super- 
natants. Bacteria were grown at 37°C with aeration in 
Luria-Bertani (LB) broth. To measure AI-2 concentrations, 
we followed the protocol described in Taga and Xavier 
(2011). Briefly, we diluted overnight cultures (1:100) into 
fresh LB medium and periodically collected aliquots of which 
we measured both optical density (OD600) and AI-2 concen- 
tration using the in vitro assay based on the CLPY-FRET 
method (Taga and Xavier 201 1). 

Results 

Diversity of the Gene Repertoire of AI-2 QS 

The phylogenetic relationship among £ coli and Shigella 
strains was inferred with the core genome of all strains includ- 
ing the outgroup (fig. 2). Every node in this phylogeny has 99- 
100% bootstrap support with the exception of two small in- 
ternal nodes, one within the phylogroup B1 with 92% and 
another within B2 with 56%. This phylogeny shows the same 
phylogroups as previous studies (Touchon et al. 2009). The 
core genome of the clade (£. coli+E. fergusonii) is only of 
1,524 genes corroborating previous conclusions that no 
single strain can be regarded as highly representative of the 
species and justifying the need for including gene repertoire 
variation in population genetic studies of £ coli. 

Figure 2 shows a remarkable pattern of presence/absence 
polymorphism of the AI-2 regulated Isr genes among all 
sampled 44 strains of £ coli and Shigella. Many strains lack 
a complete /sr operon, but all natural strains encode the AI-2 
synthase (luxS) (these results do not change with the addition 
of 13 new complete genomes that became available during 
manuscript revision). The frequency of the complete operon is 
38%. This intermediate frequency is very rare in £ coli, where 
most gene families are at frequencies higher than 90% or 



lower than 20% (Touchon et al. 2009). The close similarity 
in topology between the phylogenetic tree built with the se- 
quences of the genes in the Isr operon and the species tree 
reconstructed with the core genome (supplementary fig. SI, 
Supplementary Material online) indicates that although re- 
combination and lateral transfer cannot be ruled out, the 
prevalent phylogenetic signal in this locus is one of the species. 
Hence, the simplest explanation for the observed polymorphic 
pattern is that the operon is ancestral to all £ coli, and we can 
therefore use the presence/absence of genes to infer ancestral 
states. A maximum-likelihood reconstruction of ancestral 
states for the presence/absence of a completely functional 
Isr operon suggests that the operon was present in the last 
common ancestor of all £ coli (including Shigella) (propor- 
tional likelihood = 0.60, this is the scaled likelihood, so that 
the likelihood of both states add to 1). This is further sup- 
ported by the taxonomic distribution of the Isr operon, 
which is found in many Gammaproteobacteria, including 
most close relatives of £ coli such as £ fergusonii and the 
new £ co// clades that are outside the "typical" £ co// strains 
(Luo et al. 201 1 ; Oh et al. 201 2). This operon is also found in 
the close genus of Salmonella, Yersinia, Enterobacter, and 
Klebsiella. Hence, Isr genes lacking in extant £ coli genomes 
are most likely the result of gene losses. 

We observed no significant correlation between the pres- 
ence/absence of a complete Isr operon and pathogenicity in 
general (Pagel's 1994 test, P= 0.449). However, the patho- 
vars that are known to replicate within macrophages — all 
Shigella and all adherent invasive £ coli (AIEC) strains 
(Glasser et al. 2001)— lack the full Isr operon (Pagel's 1994 
test, P= 0.048). The majority of EXPEC strains investigated 
here (seven of eight) also lack the full operon (Pagel's 1994 
test, P= 0.080). EXPEC strains are extraintestinal pathogenic 
£ co// that are part of a healthy intestinal microbiota but can 
become virulent in extraintestinal environments (Wold et al. 
1992; Nowrouzian et al. 2005; Moreno et al. 2009). These 
results suggest association between patterns of polymorphism 
at the /sr operon and relevant phenotypic traits differentiating 
£ co// strains. 

Processes of Pseudogenization 

To gain further insight on the process leading to loss of the Isr 
operon, we studied its patterns of pseudogenization. The 
maximum-likelihood reconstruction of ancestral states sug- 
gests the occurrence of at least eight independent losses of 
function in the Isr operon during the evolution of the £ coli 
clade (supplementary fig. S2, Supplementary Material online). 
Once initiated, the process of pseudogenization is fast and 
does not follow a fixed gene order. For instance, closely 
related strains such as the ones in phylogroup B1 can differ 
substantially in presence/absence of genes within the operon 
as demonstrated by the two enterohemorragic strains (£ coli 
026:H11 1 1368 and £co// 01 11:H1 1 1 128, fig. 2). Given the 



Genome Biol. Evol. 5(1): 16-30. doi:10.1093/gbe/evs122 Advance Access publication December 16, 2012 



21 



Brito etal. 



GBE 



rapidity and complexity of gene degradation upon operon 
inactivation, it is not always possible to identify the event trig- 
gering the pseudogenization process. For instance, a large 
truncation does not preclude the pre-existence of a frameshift 
or of a nonsense mutation. Nevertheless, interesting patterns 
emerge from the comparison of different genomes. Namely, 
the presence of insertion sequences (ISs, 10 occurrences) and 
phage-related sequences (two occurrences) in the flanking 
regions of the Isr operon is only observed in genomes with 
an inactivated operon that are still in the process of pseudo- 
genization, suggesting that these elements may be involved in 
rapid changes of gene content (supplementary table S1, 
Supplementary Material online). Among the 34 observed 
pseudogenes, 26 (76.5%) correspond to large truncations 
(>100bp deletions), 6 (17.6%) to small indels (usually 1 or 
2 bp insertions/deletions that cause disruption of the reading 
frame but also 7-bp deletion), and finally 2 (5.9%) were 
caused by single point mutations that lead to premature ter- 
mination codon (supplementary table SI, Supplementary 
Material online). 

We estimated the probability that the polymorphic pattern 
of presence/absence of the complete operon could have ori- 
ginated by neutral processes alone. To test this hypothesis, we 
generated 1 ,000 binary characters under a neutral process on 
the clade phylogeny and then determined the probability of 
obtaining the level of polymorphism observed in the data. We 
considered a model of evolution that assumes asymmetrical 
transition rates where the forward rate corresponds to the 
rate of gene inactivation and the backward rate corresponds 
to the rate of regaining a functional gene that was previously 
inactivated (see Materials and Methods). Among the many 
mechanisms that lead to gene inactivation, we considered 
point mutations, small insertions and deletions, large trunca- 
tions, and IS transposition. To parameterize our model, we 
used published data. Single point mutations represented ap- 
proximately 62.7% of the total mutational events causing 
gene inactivation in a recent large-scale experimental evolu- 
tion experiment (Tenaillon et al. 2012). The mutation rate of 
£ co// was recently estimated to be 8.9 x 10"^^ per nucleotide 
per generation (Wielgoss et al. 2011). Together, these 
values lead to a rate of gene inactivation (forward rate) of 
approximately 1 .17 x 10"^ mutational events per generation 
for the /sr operon, considering that the total coding sequence 
is 8,271 bp. 

The backward rate represents the rate of back mutation 
that overcomes single point mutations, the rate of insertion/ 
deletion that reconstitutes a previous indel, the rate of excision 
of an IS, and the rate of gene conversion and lateral gene 
flow. The rate of back mutation will be negligible in relation 
to the rate of gene inactivation due to stop codons. The rate of 
excision of transposons is likely to be at least one order of 
magnitude smaller than the insertion rate (Charlesworth 
1985). Lateral gene transfer probably occurs at a highly vari- 
able rate and is potentially the most important process that 



can restore a functional operon. In the absence of estimations 
of these rates in natural setups, we tested multiple backward 
rates ranging four orders of magnitude (table 1). With all 
branches calibrated in units of generations (see Materials 
and Methods), we tested whether a neutral model could ex- 
plain the data by contrasting the polymorphism level gener- 
ated under neutrality and a similar model with rate parameters 
estimated from the data (forward rate = 6.89 x 10~^ and 
backward rate = 4.00 x 10~^). From all parameters tested, 
model 2x is the only model that generates a distribution of 
polymorphism with a median and a variance that includes the 
empirical level of polymorphism (0.38); however, this result 
implies a point mutation rate that is three orders of magnitude 
smaller than what was estimated empirically (Wielgoss et al. 
201 1). This is consistent with the interpretation that there is 
ongoing selection to maintain a functional operon in some 
strains of £ coli. Importantly, the level of polymorphism ob- 
tained by simulating 1,000 characters under a model with 
parameters estimated from the data is significantly different 
from the distribution generated with model 2x (for- 
ward = 1.17 x 10"^ backward = 5.80 X 10"^), mainly due 
to the large variance of the former (Kolmogorov-Smirnov 
text, D = 0.2352, P value < 0.001, table 1). 

Phenotypic Characterization of Gene Losses 

We made all single-gene deletions of the /sr operon as well as 
for two genes in the biosynthetic pathway of AI-2. We then 
determined the phenotypic effects for all deletions in terms of 
AI-2 accumulation and internalization in £ co// K-12 MG1655. 
AI-2 internalization profiles clearly show that every gene 
knockout leads to a measurable phenotypic effect, albeit 
some cause stronger phenotypic effects than others (fig. 3). 
In particular, mutants in the genes from the ABC transporter 
{IsrB, IsrA, IsrQ and IsrD) are less efficient in removing AI-2 
from the extracellular medium, whereas the kinase mutant 
(IsrK) does not internalize the signal. The IsrG mutant, which 
is less efficient in degrading the inducer of the system (AI-2-P), 
and the repressor mutant (IsrR) are the only mutants that 
ensue a premature AI-2 internalization. These results are con- 
sistent with the previous studies on the characterization of the 
Isr operon and its regulation (Ren et al. 2004; Xavier and 
Bassler 2005a, 2005b; Li et al. 2007; Pereira et al. 2009), 
but here we show the phenotype of extracellular AI-2 accu- 
mulation for all the /sr single mutants. Our results show that 
extracellular AI-2 concentration is affected by every single- 
gene deletion. This can impact on how the different cells in 
the population sense the signal and therefore will impact on 
the selective pressure acting on each gene deletion. 

Extrapolating the phenotypic effects of single knockout 
mutants obtained by genetic manipulation to the genotypes 
observed in the natural isolates suggests that gene deletions in 
natural strains decrease or abolish AI-2 internalization. The 
former corresponds to the two enterotoxigenic £ coli strains 
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Table 1 

Distributions of Presence/Absence Polymorpliism Generated under Diverse Evolutionary Models of Character Evolution 





Forward Rate 


Backward Rate 


Ratio 


Median 


Variance 


Kolmogorov-Smirnov Test 


Empirical data 


6.89 X 10"^ 


4.00 X 10"^ 


1.72 


0.356 


0.0223 




Model O.OIx 


1.17 X 10"^ 


1.17 X 10""^ 


0.01 


1.000 


0.0002 


D=1, P< 0.001 


Model O.lx 


1.17 X 10"^ 


1.17x 10"^ 


0.1 


0.911 


0.0019 


D = 0.994, P< 0.001 


Model 1x 


1.17 X 10"^ 


1.17 X 10"^ 


1 


0.511 


0.0057 


D = 0.5245, P< 0.001 


Model 2x 


1.17 X 10"^ 


5.80 X 10"^ 


2 


0.333 


0.0054 


D = 0.2352, P< 0.001 


Model 5x 


1.17 X 10"^ 


2.34 X 10"^ 


5 


0.178 


0.0033 


D = 0.661 7, P< 0.001 


Model lOx 


1.17 X 10"^ 


1.17 X 10"^ 


10 


0.089 


0.0021 


D = 0.8498, P< 0.001 


Model lOOx 


1.17 X 10"^ 


1.17 X 10"^ 


100 


0.000 


0.0003 


D = 0.987, P< 0.001 



Note. — Forward and backward rates are model parameters (see text), and the generated distributions are cliaracterized by tlieir median and variance of the relative 
frequency of strains with complete operon. These distributions were tested against the distribution generated with empirical parameters with a two-sample Kolmogorov- 
Smirnov test. The only model that generates a distribution of polymorphism that includes the empirical value (0.38) is in bold. 




Time (hours^ 



Fig. 3. — Extracellular AI-2 profiles of single-mutant knockouts cultured in LB at 37°C on the left, and respective absolute growth curves, on the right. On 
the top are results for single-gene knockout mutants of the Isr operon and below are results from knockout mutants of the genes involved in the AI-2 
production pathway (pfs and luxS). WT, wild type. 



(£ coli 0139:H28 E24377A and £ coli UMNK88). These 
strains have an innpaired Lsr transporter, but due to the pres- 
ence of functional kinase and repressor, it is expected that 
these strains are still capable of internalizing the signal, 
albeit at a lower rate via a less efficient system (Pereira et al. 
2012). On the other hand, all other natural strains without a 
connplete Isr operon lack a functional IsrK, which is sufficient 
to prevent any decrease of the extracellular AI-2 concentra- 
tion. LsrK is responsible for producing the inducer of the 
system (AI-2-P) and in its absence, even if the receptor gene 
(IsrB) is present, the protein is not expressed, hence IsrK mu- 
tants are impaired not only in AI-2 internalization but also in 
AI-2 sensing. The loss of IsrR leads to a constitutive expression 



of the Isr operon and thus to a very low accumulation of AI-2 
in the extracellular medium because all AI-2 that is produced is 
also internalized. Interestingly, the IsrR gene is only absent in 
the genomes that also lack a functional ABC transporter 
(LsrACDB) and the corresponding signal kinase (LsrK). 
Therefore, also in these genomes, we would predict an overall 
phenotypic effect that is similar to a complete absence of the 
Isr operon and to no internalization of AI-2 by those cells. 
Hence, we do not expect a more efficient removal of the 
AI-2 signal from the extracellular medium for any of the nat- 
ural isolates analyzed. 

Interestingly, in monocultures, and under the experimental 
conditions tested, the luxS and all Isr mutants show small, if 
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any, differences in growth rates (fig. 3). This suggests a weak 
metabolic fitness cost for the QS and contrasts with a deletion 
in the pfs gene, the enzyme upstream of luxS responsible for 
the degradation of SAH (fig. 1) that carries major fitness costs 
to the cell (fig. 3). This is important given that both pfs and 
luxS mutants have the same key effect of not producing AI-2 
(flat curves of fig. 30- 

Genetic Diversity and Levels of Selection 

Pairwise nucleotide diversity (n) varies across the genes in the 
operon (fig. 4 and table 2). It is particularly high in the IsrA 
gene encoding the ABC-ATPase enzyme that provides the 
energy necessary for the internalization of AI-2 into the cell, 
and in its two flanking genes IsrR and IsrC, but in every case, 
the levels of diversity are within the range of those observed in 
the core genome, even when controlling for similar codon 
usage bias (fig. 4). The level of diversity for nonsynonymous 
substitutions is much lower than for synonymous in all genes, 
consistent with some degree of evolutionary constrain 
(table 2). This is corroborated by the pattern of intermediate 
levels of codon usage bias recovered from the several meas- 
ures used (table 2). 

Nonsynonymous substitution patterns are not significantly 
different between complete genes in functional operons and 
complete genes in partly inactivated operons as would be ex- 
pected for genes under relaxed purifying selection (fig. 4). This 
suggests that either the cell is still using these genes or that the 
process of pseudogenization is too recent to have left a sig- 
nature in the patterns of substitution. The gene IsrG, which 
encodes one of the two enzymes responsible for processing of 



AI-2-P, is an exception as it shows high nucleotide diversity 
caused by an increased number of nonsynonymous substitu- 
tions (fig. 4, white triangles). This gene seems to be always the 
last to pseudogenize and hence would have more time to 
accumulate changes. 

To test for signatures of selection on shaping levels and 
patterns of diversity, we performed McDonald-Kreitman 
and Tajima's D tests (table 3). Tajima's D statistic was not 
significantly different from zero in any of the genes analyzed. 
Tajima's D is highly sensitive to population demography 
(Simonsen et al. 1995), and it tends to assume negative 
values in expanding populations. Because it was proposed 
that £ coli population enjoyed a large recent population ex- 
pansion (Wirth et al. 2006), expected value of D is negative 
(and not zero) based on demographic effects. We then 
decided to use McDonald-Kreitman tests to analyze the pat- 
terns of selection in this work because this test is much more 
robust to demographic effects (Nielsen 2005). Both LuxS (AI-2 
synthase) and LsrA have a pattern of diversity that rejects 
the null model of neutrality (McDonald-Kreitman test, 
table 3) but for very different reasons. The gene luxS shows 
levels of nonsynonymous polymorphism close to the average 
gene but much smaller nonsynonymous divergence (after 
normalization by synonymous substitutions, fig. 5). This is 
suggestive of strong purifying selection on nonsynonymous 
polymorphisms. On the other hand, nonsynonymous poly- 
morphisms are much more abundant in IsrA than in all the 
others genes after normalization by synonymous rates (PS and 
DS). The pattern for IsrA suggests the maintenance of high 
nonsynonymous polymorphism segregating in the population. 



o 

CO 



o — 
o 




I I \ \ \ \ \ \ 1 

tsrK isrR tsrA tsrC IsrD fsrB tsrF IsrG Core genome 

Genes 

Fig. 4. — Genetic diversity at the /sr operon. Black squares indicate total nucleotide diversity (mean and standard deviation) for each gene in the operon. 
Triangles are nucleotide diversity at nonsynonymous sites estimated in complete genes for strains with the full /sr operon (black) and for strains with partial Isr 
operons (white). For comparison, the total nucleotide diversity and nonsynonymous diversity of core genes with similar codon usage was added. 
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Table 2 

Genetic Diversity and Codon Usage 





Gene 


L 


s 


N 


h 


^total 






No. of 


ENC 


CBI 


CAI 


eCAl 


CAI/eCAl 






















Codons 
























Mean 


SD 


















Strains with 


IsrK 


1,593 


85 


15 


10 


0.014 


0.0023 


0.047 


0.003 


530 


51.728 


0.325 


0.315 


0.337 


0.935 


complete Isr operon 


IsrR 


954 


58 


15 


9 


0.018 


0.0017 


0.067 


0.002 


317 


53.143 


0.375 


0.302 


0.325 


0.929 




IsrA 


1,533 


116 


15 


12 


0.022 


0.0026 


0.059 


0.009 


511 


46.361 


0.433 


0.322 


0.328 


0.982 




IsrC 


1,029 


74 


15 


11 


0.019 


0.0025 


0.058 


0.005 


342 


48.842 


0.440 


0.333 


0.310 


1.075 




IsrD 


993 


35 


15 


11 


0.011 


0.0013 


0.035 


0.002 


330 


45.614 


0.433 


0.295 


0.302 


0.976 




IsrB 


1,023 


40 


15 


12 


0.011 


0.0014 


0.040 


0.002 


340 


49.022 


0.331 


0.363 


0.349 


1.040 




IsrF 


876 


47 


15 


12 


0.014 


0.0017 


0.047 


0.004 


291 


44.400 


0.518 


0.344 


0.343 


1.003 




IsrG 


291 


15 


15 


8 


0.011 


0.0026 


0.043 


0.002 


96 


48.112 


0.504 


0.489 


0.398 


1.228 


All strains 


luxS 


516 


21 


40 


20 


0.010 


0.0005 


0.029 


0.003 


171 


45.907 


0.517 


0.497 


0.366 


1.359 



Note. — L is sequence length, s is the number of segregating sites, N is sample size, and h is the number of unique haplotypes, and k is nucleotide diversity (mean and 
standard deviation). Nucleotide diversity is presented for total sequence, synonymous sites (S), and nonsynonymous sites (NS) only. ENC is the effective number of codons, CBI 
is the Codon Bias Index, CAI is the Codon Adaptation Index, and eCAl is the effective number of codons. Genes in the Isr operon are ordered following position on the 
chromosome. 



Table 3 

Neutrality Tests 







Tajima's D 


On 


Ds 


McDonald and Kreitman test 
Pn Ps P 


a 


Nl 


DOS 


Strains with complete Isr operon 


IsrK 


-0.7462 


9 


80 


14 


71 


0.265 


-0.753 


1.753 


-0.064 




IsrR 


-0.2277 


3 


23 


6 


53 


1.000 


0.132 


0.868 


0.014 




IsrA 


-0.5846 


11 


84 


34 


87 


0.004** 


-1.984 


2.984 


-0.165 




IsrC 


-0.6240 


4 


36 


15 


60 


0.198 


-1.250 


2.250 


-0.100 




IsrD 


0.0367 


5 


41 


4 


31 


1.000 


-0.058 


1.058 


-0.006 




IsrB 


-0.4743 


6 


28 


9 


32 


0.775 


-0.313 


1.313 


-0.043 




IsrF 


-0.7434 


1 


8 


12 


36 


0.668 


-1.667 


2.667 


-0.139 




IsrG 


-1.3815 


0 


2 


2 


14 


1.000 






-0.125 


All strains 


luxS 


-0.1734 


0 


14 


7 


15 


0.029* 






-0.318 



Note. — Ds and Dm are fixed synonymous and nonsynonymous substitutions, Ps and Pn sire segregating synonymous and nonsynonymous substitutions. P value (two 
tailed) is the significant level of the Fisher exact test that DfJDs = Pn/Ps- 1 - Nl, and Nl is the neutrality index, a is the proportion of evolutionary change that is due to 
positive selection, these values cannot be computed when Dn is zero. DOS is direction of selection. Divergence was estimated between E. coli strains against the closest 
outgroup (£. fergusonii). 



The patterns of high 6N/6S between closely related strains 
for IsrA nnight arise simply from the presence of newly created 
slightly deleterious mutations that have not yet had time to be 
purged (Rocha et al. 2006). However, the high effective popu- 
lation size of £ coli should lead to a rapid elimination of these 
mutations. Indeed, we found that 6N/6S falls rapidly to low 
values in the core genome (supplementary fig. S3, Supple- 
mentary Material online). When we compare the average 
trend of dN/dS over patristic distances with that of IsrA, we 
find much higher values for the latter, suggesting that slow 
purifying selection of slightly deleterious changes is not driving 
the differences between the two sets. This trend could result 
from a lower number of deleterious nonsynonymous changes 
in this gene relative to the core. However, when we compared 
£ coli with £ fergusonii, there was no difference in dN/dS 
between IsrA and the core genome. The observation of an 
excess of dN/dS in IsrA relative to core genome within the 
species and identical dN/dS in the two sets when comparisons 



are made between species fits very well the hypothesis of 
diversifying selection for IsrA. 

Figure 6 shows for IsrA and for luxS a sliding window ana- 
lysis of the ratio of nonsynonymous to synonymous substitu- 
tions segregating within £ coli and between £ coli and 
£ fergusonii. The asterisks indicate the approximate gene lo- 
cation of codons where a significant signal of selection was 
detected with the codon-based approach. Three significant 
codons distributed throughout the gene (positions 76, 181, 
and 482) were detected for IsrA, whereas only one codon 
(position 1 45) was identified for luxS. 

Discussion 

We have found that £ coli exhibits a gene repertoire poly- 
morphism in the /sr operon. We have experimentally shown 
that such polymorphism leads to cells lacking the ability to 
bind, internalize, and/or process the QS signal. However, all 
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Fig. 5. — Nonsynonymous to synonymous rates across genes in the /sroperon and the luxS. Intraspecific comparisons correspond to nucleotide diversity 
for synonymous and nonsynonymous sites (black squares), whereas comparisons with the outgroup {Escherichia fergusonii) correspond to synonymous and 
nonsynonymous nucleotide divergence (triangles). Black triangles are average nucleotide differences between species, whereas white triangles correspond to 
net divergence, which subtracts to the previous statistic the average nucleotide differences for polymorphic sites. Error bars are standard errors estimated 
after 1,000 bootstrap replicates. 



Strains maintain a functional LuxS, the synthase of the QS 
signal, even though the fitness cost of this deletion in mono- 
cultures is as low as that of many Isr genes. Overall, the evo- 
lution of the genes essential for regulating AI-2 concentration 
was shown here to be complex and non-neutral. Fifty-eight 
percent of £ coli strains cannot regulate AI-2 extracellular 
concentrations, 23 of the 40 strains analyzed lack a functional 
LsrK, these strains produce AI-2 but do not have the ability to 
sense or remove AI-2 from itself or others. We did not find any 
natural strain that lacks the LsrR repressor and still have a 
potentially functional operon. This suggests counter selection 
of strains that could be more efficient at removing AI-2. 
Hence, the overall phenotypic effect of the observed operon 
pseudogenization is always toward the decrease or total abol- 
ishment of AI-2 internalization and removal from the environ- 
ment. Importantly, the comparative genomic analyses indicate 
that this functional polymorphism is maintained by natural 
selection. 

We find both signatures of selection to lose the operon and 
selection to maintain it, creating a balanced polymorphism at 
the level of gene content. This leads to a frequency of the Isr 
operon intermediate between that of persistent and of volatile 
genes (van Passel et al. 2008; Kuo and Ochman 2009; 
Touchon et al. 2009) The selective pressure to lose the 
operon is supported by the inference of at least eight inde- 
pendent events of operon inactivation and the observation 



that pseudogenization, when it occurs, is too fast to be a 
neutral process. This fast gene extinction dynamics was al- 
ready observed in the genomes of Salmonella enterica vs. 
Gallinarum (Kuo and Ochman 2010) and occurs through 
the same general mechanisms described for bacterial pseudo- 
gene formation (e.g., large truncations, small frameshift 
indels, and stop codons) (Lerat and Ochman 2005; Ochman 
and Davalos 2006). Although our data differ from Kuo and 
Ochman (2010) in that all genes are inferred to be ancestral 
and pseudogenization is shared with many strains, some of 
which very distantly related. In that study, the authors analyze 
147 pseudogenes of which only five were shared with the 
closest related strain and only three are inferred to be ances- 
tral. Our data fit better the models of balancing selection than 
a random model of gene loss, even though we cannot exclude 
the possibility that pseudogenization of one gene accelerates 
the loss of the other genes in the operon. Selection to main- 
tain the ability to respond to extracellular AI-2 is suggested by 
balancing selection patterns in IsrA gene, as well as the poly- 
morphism observed in all other /sr genes of complete operons 
that are typical of functional genes. In addition, we inferred 
through simulation that the majority of the £ coli strains 
should have already lost the operon unless there is ongoing 
selective pressure to maintain it. 

The mechanisms of selection maintaining these polymorph- 
isms are an important line of future research. Theoretical 
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Fig. 6. — Sliding window analysis of the ratio of nonsynonymous (N) to synonymous (S) substitutions segregating within Escherichia co// (P) and between 
£ coli and £ fergusonii (D) across IsrA and luxS genes. Window size is 30 bp with 1 5-bp steps. Sample size is 1 5 for IsrA and 40 for luxS corresponding to 
strains with complete operons and all natural strains, respectively. Asterisks indicate approximate position of codons identified as targets of selection (codons 
76, 181, and 482 in LsrA, and codon 145 in LuxS). 



models have shown that balancing selection may occur for 
diverse reasons and could potentially be quite common 
(Gillespie 2004). We propose two nonmutually exclusive 
hypotheses for the maintenance of polymorphisms in this 
system. We observed that all Shigella and AIEC, which are 
strains known to replicate within macrophages, lack the Isr 
operon. The loss of the /sr operon in these strains could thus 
be a consequence of adaptation to a specific pathovar. This 
fits the hypothesis that cooperative processes regulated by QS 
are less important in bacteria with low infectious dose and 
able to replicate in professional phagocytes (Gama et al. 
2012). However, this intracellular niche adaptation hypothesis 
cannot explain all losses observed in our data because many 
other strains lack Isr. 



The other hypothesis relates to the social consequences of 
mutations in the genes regulating AI-2 QS. We found no 
measurable growth cost for the loss of AI-2 QS mechanism 
(fig. 3), but in £ coli, AI-2 regulates costly group behaviors 
such as virulence and biofilm formation (Gonzalez-Barrios 
et al. 2006; Herzberg et al. 2006; Reisner et al. 2006; Zhu 
et al. 2007; Lee et al. 201 1). Hence, although QS mutations 
have little direct metabolic effects, as growth is not affected in 
monocultures, they are likely to have ecological benefits by 
providing the cells the ability to exploit social processes in 
microbiomes. In most QS systems of Gram-negative bacteria, 
high cell densities are associated with high concentration of 
the signal. Elements that have lost the ability of producing 
the signal but still benefit from the information of producers 
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(emitters) are thus noncooperating and have a fitness advan- 
tage (Diggle et al. 2007), in these systems receptors are much 
more abundant than emitters (Patankar and Gonzalez 2009). 
In contrast, in the E. coli AI-2 QS system, we observed gene 
repertoire polymorphism at the level of the signal receptor 
{Isr operon) and not at the signal emitter {luxS). All these 
gene losses reduce or abolish AI-2 reception and internaliza- 
tion but do not reduce AI-2 production. This can be inter- 
preted as coercive behavior, which is a particular type of 
social cheating if demonstrated that these cells would benefit 
from forcing the nearby cells into cooperative behaviors, while 
themselves refraining from cooperating (Diggle et al. 2007; 
Foster et al. 2007). Consistently, it was shown that £ coli Isr 
mutants can induce the onset of cooperative behaviors of 
Vibrio harveyi and V. cholerae even when they are at low 
quorum (Xavier and Bassler 2005a) 

Because AI-2 has been shown to regulate biofilm formation 
and virulence traits in £ coli, it is expectable that the cells that 
do not internalize AI-2 but still contribute to their increased 
concentration in the extracellular medium will promote the 
remaining AI-2-sensitive cells in the vicinity to hasten the 
onset of the behavior they regulate with AI-2. It was recently 
predicted by Van Dyken and Wade (2012) that when social 
cheaters are maintained in natural populations as an evolu- 
tionary stable strategy, then it should also be expected that 
cheaters would be characterized by large insertion/deletions, 
frameshift mutations, or premature STOP codons; these are 
features that characterize the Isr operons of £ coli natural 
variants that do not regulate extracellular AI-2. 

The interpretation that gene repertoire polymorphism in 
the Isr operon is maintained through a process of social evo- 
lution is further strengthened by the observation that luxS, the 
signal synthase, is present in all strains (fig. 2) even though we 
detect no fitness effect in the single-gene knockout mutant 
monocultures (fig. 3). Because of its enzymatic role in recycling 
products of SAM metabolism, it was suggested that the se- 
lective pressure to maintain luxS was primarily to detoxify the 
cell and recycle the products of SAM metabolism (Schauder 
et al. 2001; Winzer et al. 2003; Vendeville et al. 2005; De 
Keersmaecker et al. 2006; Hardie and Heurlier 2008). 
Importantly, we show that a mutant in Pfs, the enzyme im- 
mediately upstream of LuxS in the metabolism of SAM, does 
show a marked growth defect (fig. 3) probably due to the 
toxic accumulation of SAH (Schauder et al. 2001 ; Winzer et al. 
2002). This strongly indicates that Pfs, not LuxS, is the major 
enzyme responsible for preventing the toxic consequences of 
SAH accumulation. The similarity in the phenotypic effects and 
their extreme difference in fitness cost highlight the import- 
ance of pfs in central metabolism, and it suggests that the 
selective pressure to maintain a functional luxS in the cell is not 
metabolic but social. Naturally, this conclusion has to be con- 
textualized in the whole discussion of the AI-2 as a QS signal; a 
social cost cannot be attributed to any gene that does not 
present a strong metabolic cost. 



We still lack a direct experimental demonstration of such 
social benefit. This is difficult to show for inter-specific QS 
because it requires experimentation in complex environments. 
Nevertheless, it is known that in the vertebrate gut, £ coli 
experiences a complex multispecies environment where the 
ability to interact (or interfere) with other cells, of the same or 
of different species, may influence the evolution of its AI-2 
regulation system (McNab et al. 2003). Interestingly, extrain- 
testinal virulence of an £ coli AI-2 QS-negative strain (£ coli 
B2S) was shown to be boosted in mix strains infections com- 
pared with pure culture infections when mixed with an AI-2 
QS positive regarded as commensal (£ coli MG1655) (Tourret 
et al. 201 1). Hence, QS polymorphisms might lead to exploit- 
ation of commensals by pathogens to increase virulence. 

Overall, our findings suggest that complex adaptations of 
species with polyclonal interactions, such as £ coli, can be due 
to genes maintained at intermediary frequencies rather than 
ubiquitous or pathovar-specific genes. 

Supplementary Material 

Supplementary tables SI and S2 and figures S1-S3 are avail- 
able at Genome Biology and Evolution online (http://gbe. 
oxfordjournals.org/). 
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